Autotuning of Fftw Library for Massively Parallel Supercomputers Scalability Improvements for Dft Codes Due to the Implementation of the 2d Domain Decomposition Algorithm

نویسندگان

  • Massimiliano Guarrasi
  • Sandro Frigio
  • Andrew Emerson
  • Giovanni Erbacci
چکیده

In this paper we will present part of the work carried out by CINECA in the framework of the PRACE-2IP project aimed to study the effect on performance due to the implementation of a 2D Domain Decomposition algorithm in DFT codes that use standard 1D (or slab) Parallel Domain Decomposition. The performance of this new algorithm are tested on two example applications: Quantum Espresso, a popular code used in materials science, and , the CFD code BlowupNS. In the first part of this paper we will present the codes that we use. In the last part of this paper we will show the increase of performance obtained using this new algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Techniques for Computing Band Linear Recurrences on Massively Parallel and Vector Supercomputers

In this paper, we present a new scalable algorithm, called the Regular Schedule, for parallel evaluation of band linear recurrences (BLR's, i.e., mth-order linear recurrences for m 1). Its scalability and simplicity make it well suited for vector supercomputers and massively parallel computers. We describe our implementation of the Regular Schedule on two types of machines: the Convex C240 and ...

متن کامل

Fourier Transforms for the BlueGene/L Communication Network

A computational kernel of particular importance for many scientific applications is the Fast Fourier Transform (FFT) of multi-dimensional data. A fundamental challenge is the design and implementation of such parallel numerical algorithms to utilise efficiently thousands of nodes. The BlueGene/L is a massively parallel high performance computer organised as a three-dimensional torus of compute ...

متن کامل

New Parallel Matrix Multiplication Algorithms for Wormhole-Routed All-Port 2D/3D Torus Networks

New matrix multiplication algorithms are proposed for massively parallel supercomputers with 2D/3D, all-port torus interconnection networks. The proposed algorithms are based on the traditional row-by-column multiplication matrix product model and employ a special routing pattern for better scalability. They compare favorably to the variants of Cannon’s and DNS algorithms since they allow matri...

متن کامل

Protein Folding with Python on Supercomputers

Today’s supercomputers have hundreds of thousands of compute cores and this number is likely to grow. Many of today’s algorithms will have to be rethought to take advantage of such large systems. New algorithms must provide fine grained parallelism and excellent scalability. Python offers good support for numerical libraries and offers bindings to MPI that can be used to develop parallel algori...

متن کامل

A New Parallel Matrix Multiplication Algorithm for Wormhole-Routed All-Port 2D/3D Torus Networks

A new matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D/3D, all-port torus interconnection networks. The proposed algorithm is based on the traditional row-by-column multiplication matrix product model and employs a special routing pattern for better scalability. It compares favorably to the variants of Cannon’s and DNS algorithms since it allows matrices...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013